OrpheusDB: Bolt-on Versioning for Relational Databases
نویسندگان
چکیده
Data science teams often collaboratively analyze datasets, generating dataset versions at each stage of iterative exploration and analysis. There is a pressing need for a system that can support dataset versioning, enabling such teams to efficiently store, track, and query across dataset versions. While git and svn are highly effective at managing code, they are not capable of managing large unordered structured datasets efficiently, nor do they support analytic (SQL) queries on such datasets. We introduce ORPHEUSDB, a dataset version control system that “bolts on” versioning capabilities to a traditional relational database system, thereby gaining the analytics capabilities of the database “for free”, while the database itself is unaware of the presence of dataset versions. We develop and evaluate multiple data models for representing versioned data, as well as a light-weight partitioning scheme, LYRESPLIT, to further optimize the models for reduced query latencies. With LYRESPLIT, ORPHEUSDB is on average 10× faster in finding effective (and better) partitionings than competing approaches, while also reducing the latency of version retrieval by up to 20× relative to schemes without partitioning. LYRESPLIT can be applied in an online fashion as new versions are added, alongside an intelligent migration scheme that reduces migration time by 10× on average.
منابع مشابه
Towards a Model forSpatio - Temporal Schema
Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data is formatted also becomes an issue. This paper presents a genera...
متن کاملOn Schema Versioning in Temporal Databases
The support of schema versioning has been considered in the literature on temporal databases only at a limited extent. In particular, solutions for managing schema versions along transaction-time as different interfaces on the same temporal data were proposed so far. In this paper we investigate the distinct functionalities of new solutions for schema versioning along validand transaction-time ...
متن کاملA formal model for temporal schema versioning in object-oriented databases
The problem of supporting temporal schema versioning has been extensively studied in the context of the relational model. In the object-oriented environment, previous works were devoted to the study of the different aspects of schema evolution or (non-temporal) versioning in branching models, due to the traditional origination of the object-oriented model from CAD/CAM and CIM. Nowadays, the com...
متن کاملTowards a Model for Spatio-Temporal Schema Selection
Schema versioning provides a mechanism for handling change in the structure of database systems and has been investigated widely, both in the context of static and temporal databases. With the growing interest in spatial and spatio-temporal data as well as the mechanisms for holding such data, the spatial context within which data is formatted also becomes an issue. This paper presents a genera...
متن کاملA Taxonomy for Schema Versioning Based on the Relational and Entity Relationship Models
Recently there has been increasing interest in both the problems and the potential of accommodating evolving schema in databases, especially in systems which necessitate a high volume of structural changes or where structural change is difficult. This paper presents a taxonomy of changes applicable to the Entity-Relationship Model together with their effects on the underlying relational model e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017